MTTK: An Alignment Toolkit for Statistical Machine Translation
نویسندگان
چکیده
The MTTK alignment toolkit for statistical machine translation can be used for word, phrase, and sentence alignment of parallel documents. It is designed mainly for building statistical machine translation systems, but can be exploited in other multi-lingual applications. It provides computationally efficient alignment and estimation procedures that can be used for the unsupervised alignment of parallel text collections in a language independent fashion. MTTK Version 1.0 is available under the Open Source Educational Community License.
منابع مشابه
BIA: a Discriminative Phrase Alignment Toolkit
In most statistical machine translation systems, bilingual segments are extracted via word alignment. However, word alignment is performed independently from the requirements of the machine translation task. Furthermore, although phrase-based translation models have replacedword-based translationmodels nearly ten years ago, word-basedmodels are still widely used for word alignment. In this pape...
متن کاملPostCAT - Posterior Constrained Alignment Toolkit
In this paper we present a new open-source toolkit for statistical word alignments Posterior Constrained Alignment Toolkit (PostCAT). e toolkit implements three well known word alignment algorithms (IBM M1, IBM M2, HMM) as well as six new models. In addition to the usual Viterbi decoding scheme, the toolkit provides posterior decoding with several flavors for tuning the threshold. e toolkit a...
متن کاملMT-EQuAl: a Toolkit for Human Assessment of Machine Translation Output
MT-EQuAl (Machine Translation Errors, Quality, Alignment) is a toolkit for human assessment of Machine Translation (MT) output. MT-EQuAl implements three different tasks in an integrated environment: annotation of translation errors, translation quality rating (e.g. adequacy and fluency, relative ranking of alternative translations), and word alignment. The toolkit is webbased and multi-user, a...
متن کاملTraining a Statistical Machine Translation System without GIZA++
The IBM Models (Brown et al., 1993) enjoy great popularity in the machine translation community because they offer high quality word alignments and a free implementation is available with the GIZA++ Toolkit (Och and Ney, 2003). Several methods have been developed to overcome the asymmetry of the alignment generated by the IBM Models. A remaining disadvantage, however, is the high model complexi...
متن کاملLinguistically Motivated Unsupervised Segmentation for Machine Translation
In this paper we use statistical machine translation and morphology information from two different morphological analyzers to try to improve translation quality by linguistically motivated segmentation. The morphological analyzers we use are the unsupervised Morfessor morpheme segmentation and analyzer toolkit and the rule-based morphological analyzer T3. Our translations are done using the Mos...
متن کامل